JSUM: A Multitask Learning Speech Recognition Model for Jointly Supervised and Unsupervised Learning

نویسندگان

چکیده

In recent years, the end-to-end speech recognition model has emerged as a popular alternative to traditional Deep Neural Network—Hidden Markov Model (DNN-HMM). This approach maps acoustic features directly onto text sequences via single network architecture, significantly streamlining construction process. However, training of models typically necessitates significant quantity supervised data achieve good performance, which poses challenge in low-resource conditions. The use unsupervised representation reduces this necessity. Recent research focused on techniques employing joint Connectionist Temporal Classification (CTC) and attention mechanisms, with some also concentrating presentation learning. paper proposes multi-task learning (JSUM). Our leverages pre-trained wav2vec 2.0 shared encoder that integrates CTC-Attention generative adversarial into unified architecture. method provides new language solution optimally utilizes datasets by combining CTC, attention, losses. Furthermore, our proposed is suitable for both monolingual cross-lingual scenarios.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multitask Learning with CTC and Segmental CRF for Speech Recognition

Segmental conditional random fields (SCRFs) and connectionist temporal classification (CTC) are two sequence labeling methods used for end-to-end training of speech recognition models. Both models define a transcription probability by marginalizing decisions about latent segmentation alternatives to derive a sequence probability: the former uses a globally normalized joint model of segment labe...

متن کامل

Semi-supervised Multitask Learning for Sequence Labeling

We propose a sequence labeling framework with a secondary training objective, learning to predict surrounding words for every word in the dataset. This language modeling objective incentivises the system to learn general-purpose patterns of semantic and syntactic composition, which are also useful for improving accuracy on different sequence labeling tasks. The architecture was evaluated on a r...

متن کامل

Cooperative supervised and unsupervised learning algorithm for phoneme recognition in continuous speech and speaker-independent context

Neural networks have been traditionally considered as an alternative approach to pattern recognition in general, and speech recognition in particular. There have been much success in practical pattern recognition applications using neural networks including multi-layer perceptrons, radial basis functions, and self-organizing maps (SOMs). In this paper, we propose a system of SOMs based on the a...

متن کامل

Meta-Unsupervised-Learning: A supervised approach to unsupervised learning

We introduce a new paradigm to investigate unsupervised learning, reducing unsupervised learning to supervised learning. Specifically, we mitigate the subjectivity in unsupervised decision-making by leveraging knowledge acquired from prior, possibly heterogeneous, supervised learning tasks. We demonstrate the versatility of our framework via comprehensive expositions and detailed experiments on...

متن کامل

Tied Multitask Learning for Neural Speech Translation

We explore multitask models for neural translation of speech, augmenting them in order to reflect two intuitive notions. First, we introduce a model where the second task decoder receives information from the decoder of the first task, since higher-level intermediate representations should provide useful information. Second, we apply regularization that encourages transitivity and invertibility...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied sciences

سال: 2023

ISSN: ['2076-3417']

DOI: https://doi.org/10.3390/app13095239